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ON  SOME  NUMERICAL  PROPERTIES  OF  ARMA  PARAMETER  ESTIMATION  PROCEDURES 


II.  Joseph  Newton 
Institute  of  Statistics, 
Texas  A&M  University 


Abstract 

This  paper  reviews  the  algorithms  used  by  statisticians  for 
obtaining  efficient  estimators  of  the  parameters  of  a  univari¬ 
ate  autoregressive  moving  average  (ARMA)  time  series.  The 
connection  of  the  estimation  problem  with  the  problem  of  pre¬ 
diction  Is  investigated  with  particular  emphasis  on  the  Kalman 
filter  and  modified  Cholesky  decomposition  algorithms.  A 
result  from  prediction  theory  Is  given  which  provides  a  signi¬ 
ficant  reduction  In  the  computations  needed  in  Ansley’s  (1979) 
estimation  procedure.  Finally  it  is  pointed  out  that  there  are 
many  useful  facts  In  the  literature  of  control  theory  that  need 
to  be  investigated  by  statisticians  interested  in  estimation 
and  prediction  problems  in  linear  time  series  models. 
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1.  INTRODUCTION 


nite  order  autoregressive  process. 


Let  {e(t),tcZ),  Z  che  set  of  integers,  be  a 
white  noise  time  series  of  uncorrelated  zero 
mean  random  variables  having  common  variance 
o2.  Then  the  time  series  (Y(t),teZ)  satisfy¬ 
ing 

p  9 

Y (t)  -  -Z  a(j)Y(t-j)+e(t)+  Z  8(k)c(t-k)  (1.1) 

J«1  k«l 

is  called  an  autoregressive  moving  average  pro¬ 
cess  of  order  p  and  q  (ARMA(p,q)).  If  p=0  then 
Y(-)  is  a  moving  average  process  of  order  q, 
MA(q),  while  if  q-0  Y(*)  is  an  autoregressive 
process  of  order  p,  AR(p). 

Defining  a(0)-8(0)«l  and  the  complex  valued  poly¬ 
nomials 

p  1  q  k 

g(z)«  E  a(j)zJ  ,  h(z)-  Z  8(k)z  , 

J-0  k-Q 

we  can  write  (1.1)  as 

P  q 

Z  a(J)Y(t-J)-  Z  B(k)c(t-k) 
j-0  k-0 

or 

»a)Y(t)-h(L)c(t), 

where  L  Is  the  lag  or  back  shift  operator, 

L^Y(t)»Y(t-J) ,  JcZ.  If  the  zeros  of  g(0  are 
outside  the  unit  circle  then  Y(-)  Is  stationary, 
l.e.  It  can  be  written  as  an  infinite  order 
moving  average  process,  while  if  the  zeros  of 
h(-)  are  outside  the  unit  circle  then  Y(-)  Is 
Invertible,  l.e.  It  can  be  written  as  an  lnfi- 


The  ARMA  model  has  been  very  useful  In  maly- 
zlng  time  series  data.  Given  a  sample  reali¬ 
zation  YT=(Y(1) , . . . ,Y(T))  from  Y(-)  one  seeks 
T  .  T  .  . 

estimators  a=(a(l) .... ,a(p))  ,  §-(0(1),.,, 

»  T  o 

B(q))‘,  and  oz  of  the  parameters  a, (5,  and  o  , 

as  well  as  memory-t,  horizon-h,  minimum  mean 

square  error  linear  predictors  and  prediction 

variances  Y(t+h|t),  o2  -E{Y(t+h)-Y(t+h| t) }2 
C  f  Cl 

of  Y(t+h)  given  Y(l) , . . .  ,Y(t)  for  a  variety  of 
memories  and  horizons  t-t^, . . . .t^,  h-h^,...,!^. 

Thus  Y(t+h|t)”iy  .Y  where  X  .  minimizes 
~  c , n- 1  i  n 

S({)-E{Y(t+h)-£TYt}2  , 

and  o2  -S(A  .  ) .  This  gives  (see  Whittle  (1963) 
t,h  -t,n 

p.47)  that  1  ,  satisfies  the  normal  equations 

-t,h 


rY,t-t,h"-t,h 


°t,h“RY(0)"Jt,h  rY,t^t,h’ 

where  r^  h“ (R^ (t+h-1) , . . . ,Ry(h))  with  Ry(v)- 

E(Y(t)Y(t+v)),vcZ,  and  Tv  is  the  (t*t)  Toeplltz 

*  •  c 

covariance  matrix  of  Y„\  i.e.  T„  .  has  (J.k)th 

I  g  t 

element  Ry(k-J).  We  can  thus  define  t  by  its 
first  row;  Ty  ^-TOEPLfR^ (0) . RY(t-l)). 


The  usual  assumption  made  for  doing  estimation 
and  prediction  in  ARMA  models  is  that  c(*)  (and 
thus  Y('))  is  a  Causslan  process.  Thus  the  maxi- 


Bum  likelihood  estimators  a, (5,  and  o'1  maximize 
the  Gauss  inn  llkctllinud 

L(a,B.o2/YT)-(?»)'T/2|rYTr,1exp(-‘iYjr^TYT), 
while  Y (t+h| t)  and  o2  .  are  the  conditional  mean 

t  ,  rt 

end  variance  of  Y(t+h)  given  Y (1) , . . . ,Y(t) . 

In  tills  paper  we_l)  review  attempts  liy  statisti¬ 
cian  9  to  obtain  a,§,  and  a2  (or  estimators 

asymptotically  equivalent  to  them),  2)  show  how 
recent  methods  are  closely  connected  with  finding 
predictors,  and  3)  propose  an  Improvement  of 
Ansley's  (1979)  estimation  procedure.  This  pro¬ 
cedure  Is  currently  regarded,  at  least  In  many 
situations,  as  the  most  numerically  efficient 
available.  Finally  we  will  bo  able  to  compare 
the  two  most  popular  methods  available. 

2.  APPROXIMATE  METHODS 

There  have  been  three  basic  types  of  procedures 
used  to  estimate  the  parameters  of  the  ARMA  model: 
1)  estimators  that  are  derived  heuristlcally  and 
then  shown  to  be  asymptotically  equivalent  to  the 
MLE,  2)  estimators  obtained  by  maximizing  a  func¬ 
tion  asymptotically  equivalent  to  the  likelihood 
L,  and  3)  directly  maximizing  L.  The  procedures 
have  evolved  as  both  computing  software  and  hard¬ 
ware  have  Improved. 

As  an  example  of  a  type  1  procedure  we  consider 
Hannan's  (1969)  method.  The  method  consists  of 
1)  finding  consistent  initial  estimators  q(®) 

and  8^®\  2)  performing  an  alternating  proced¬ 
ure  of  the  form  a(®)  *-b(  1 )  ,  B^'^-a^  )  ,a^ 1  ^-*B^2^  » 

3)  combining  to  obtain  asymptotically 

efficient  B,  6)  0  ■*  a,  and  5)  possibly  iterat¬ 
ing  the  process  by  returning  to  (2)  with  a  re¬ 
placing  at®) . 

The  Initial  estimators  at®)  and  are  obtained 

by  noting  the  following  facts  about  the  ARMA 
model: 

P 

1)  t  a(J)R^(J-v)-0  ,  v»q+l,...,q+p  (2.1) 

J-0  T 

2)  By  writing 

g(L)Y(t)-h(L)e(t)«X(t),  (2.2) 

we  note  that  X(-)*vMA(q,B.o2),l.‘e.  X(-)  la  an 
MA(q)  process  with  parameters  0  and  o2.  Thus 

by  the  first  equality  In  (2.2)  we  have 
P 

*,(*>"  z  atJ)a(k)MJ+v-k),veZ,  (2.3) 

*  J.k-0  * 

while  from  the  second  equality  we  have 


Rx(v)- 


q-|v| 

j2  E  0(k)B(k+|v|)  ,  I v | <q 
k-0 

0  ,  |v| >q 


(2.4) 


Then  defining  the  consistent  sample  autocovari¬ 
ances 

Mv>-|  T'H*(t)Y(t+|v|)  ,  |v|  <  T  , 

t-1 

at®)  is  found  by  solving  (2.1)  with  Rj.(v) 
replacing  R^(v),  while  g(0)  Is  obtained  by  first 
using  at®)  and  R^O)  In  (2.3)  to  obtain  con¬ 
sistent  estimators  R^®)  (0) , . . .  .R^®)  (q)  which 
are  then  used  In  (2.4)  to  get  §(0)  via  Wilson's 
(1969)  or  Bauer's  (195S)  algorithm. 


Steps  2  and  4  of  Hannan’s  method  are  performed 
in  the  frequency  domain  by  noting  from  (2.2) 
that  the  spectral  density  functions  fy(')> 

fx(-),  and  fc(-)  of  Y(),  X(-),  and  e(-)  are 

related  by  (since  f£(w)“  a2/ 2*  ,  ve[0,2ir]) 

|g(eiM)|2fv(w)-  §£  | h(eiw) | 2  »fx(w).  (2.5) 


Then  we  can  rewrite  (2.5)  as,  dropping  arguments 
for  convenience. 


^/o2 

2ir 


ThT 


(2.6) 


fY  o2  1 

ThF  “  ^  TiT 

(2w/o2)2[g|2fY  4„2/g2  i 

[hp  2"  ThT 


(2.7) 


(2.8) 


Thus  the  right  hand  sides  of  (2.6), (2.7),  and 
(2.8)  are  In  the  form  of  an  autoregressive  spect¬ 
ral  density,  and  thus  for  v>0. 


q  2*  | g| 2f 

£  6(J )/  -77- 
J-o  0  lx 


Y 


#i(J-v)w 


dv  ■ 


«  *4 

V  0* 


(2.9) 


2*  f. 


1  °^)/  ThT7  * 
J-0  0  |n| 


v  i(J-v)w 


dw 


6  o2 
v 


q  2s 

l  S(J)/ 

J-0  0 


lg!2f, 


I  .Kj-Vjw^  .  6  0J 


(2.10) 

(2.11) 


Mow  f^  can  be  estlaatea  by  the  perlodogram 

V*»  -  £ ,  «,<»>•■'”  • 

|v|  <T 

and  since  X(«)  -  MA(q ,  8,  o2)  we  have  that 


Then  for  example,  In  nn  obvious  notation,  the 
step  a(0)  -*•  8^  In  done,  by  solving  (2.9)  with 
g^°\  f^,  and  f£°^  replacing  g,  fy,  and  f^  and 

the  fast  Fourier  transform  used  to  calculate  a 
rectangular  sum  approximation  to  the  Integrals 

needed.  Then  (2.10)  Is  used  for  8^  *  and 

§  -  a  while  (2.11)  Is  used  for  a(l)  -  §(2). 

Finally  the  combination  of  8^'^  and  Is  done 

using  an  analogy  with  two-stage  least  squares. 

In  a  pivotal  article,  Akaike  (1973)  noted  that 
Hannan's  method  was  the  same  ns  one  step  In 
directly  maximizing  L  using  the  Newton-Raphson 
procedure  with  nn  approximation  to  the  Hessian 
matrix  of  L.  This  observation  led  many  re¬ 
searchers  to  turn  their  attention  to  other  pos¬ 
sible  approaches  to  directly  maximize  L.  We  will 
consider  two  of  these  In  the  next  section. 

As  an  example  of  a  type  2  estimation  procedure 
we  consider  the  method  of  Box  and  Jenkins  (1970). 

T  -1 

In  large  samples  the  term  exp(-%Y_rY  ^.Y^.)  domi¬ 
nates  the  term  Ir^  t|  Thus  Box  and  Jenkins 

suggest  maximizing 

L'( a,  8,  o2|YT)  -  exp(-  Y^r^TYT) 

which  can  be  done  la  a  nonlinear  regression 
framework.  Thus  It  can  be  shown  that 
T 

iKt^t  '  1  [e(t)l2  (2a2) 

where  [e(t)]  «  E(e(t)|YT,  a,  8).  Thus  the  Box- 

Jenkins  procedure  consists  of  replacing  by 
-Q  in  (2.12)  and  approximating  (c (t) ]  by  back- 
forecasting. 

It  is  generally  agreed  that  in  the  case  of  large 
T  and/or  zeros  of  h(z)  far  from  the  unit  circle 
that  estimation  methods  of  type  1  and  type  2 
give  very  reasonable  results.  However  if  one 
is  faced  with  a  situation  where  the  above  con¬ 
ditions  are  not  satisfied,  then  the  numerical 
properties  of  the  procedures  nullify  the  bene¬ 
fit  of  their  simplicity.  Thus  Iterations  often 
fall  to  converge  In  a  reasonable  time  and  sys¬ 
tems  of  equations  that  must  be  solved  become 
highly  Ill-conditioned.  Thus  there  has  been  a 
need  for  more  numerically  stable,  exact  maxi¬ 
mum  likelihood  methods. 

3.  EXACT  MAXIMUM  LIKELIHOOD  .PROCEDURES 

With  the  advent  of  iterative  nonlinear  optimi¬ 
zation  procedures  which  require  only  a  starting 
value  and  evaluation  of  the  function  to  be 
optimized  for  given  values  of  Its  arguments 
(see  Dennis  and  More  (1977),  for  example),  the 
most  recent  procedures  for  ARMA  parameter  esti¬ 
mation  have  centered  on  evaluating  L  for  given 
values  of  a,  8,  and  o2. 


In  this  'icd  Ion  we  consider  two  such  methods 
for  evaluating  1)  using  the  Kalman  filter 
algorithm  and  7)  using  covnrlnnce  matrix  de¬ 
composition  methods,  particularly  the  Cholesky 
decomposition  algorithm.  The  basic  purpose  of 
these  algorithms  Is  to  obtain  the  one  step  ahead  - 

prediction  errors  e(t)  -  Y(t)  -  Y(t|t  -  1)  and 
prediction  variances  n2  «  o2_^  t  -  1 .  T 

since  one  con  cosily  show  that 

|rv  |  -  no2  yIx"1  y  -  r  . 

Y.T  t_1  t  .T  Y,T.T  t-1  Ot 

To  summarize,  the  two  methods  currently  regarded 
as  most  numerically  efficient  for  maximizing  L 
consist  of  two  stages: 

1)  Find  initial  estimators  as  described  above 
for  Hannan's  method. 

2)  Use  an  Iterative,  derivative  free  nonlinear 
optimization  algorithm  to  find  the  maximum 
likelihood  estimators,  using  either  the 
Kalman  filter  algorithm  (Akaike  (1974), 

Harvey  and  Phillips  (1979),  Jones  (1980), 
Pearlman  (1980),  or  Gardner,  Harvey,  and 
Phillips  (1980),  for  example)  or  the  Cho¬ 
lesky  decomposition  algorithm  (see  Pagano 
and  Parzen  (1973),  Pagano  (1976),  Phadke 
and  Kedem  (1978),  Ansley  (1979),  Newton 
(1980),  Newton  and  Pagano  (1981),  for  ex¬ 
ample)  to  find  the  e(t)  and  o2  needed  to 
evaluate  L. 

Kalman  Filter  Algorithm 

Consider  the  following  two  equations: 

Yt+j  ■  HtYt  +  GtUt  (State  Equation) 

Xt+^  -  St+1Yt+1  +  Vt+1  (observation  equation) 

where  the  Y's  are  unobservable  N-vectors,  the 
X' s  are  observable  M-vectors,  H^,  C^,  and  St  are 

known  (NxN),  (NxL),  and  (MxN)  matrices,  and  Ut, 

V  are  independent  zero  mean  L  and  N  dimensional 

white  noise  series  with  known  covariance  matrices 
Qt  and  R£. 

Then  given  Initial  values  Yq  and  Pq  *  va^Y^) 
one  finds  the  Y^'s  and  P£  «  cov(Yt)  by  the  Kal¬ 
man  filter  algorithm 

~t+l  "  -t+1  ~  Kt+l(St+l?t+l  *  -t+1* 

Pt+l  “  Vl  '  Kt+lWt+l 
Vi  ■  Vf  Vi  ■  Wt+lSI+l 


‘(Wt+lSt+l  *  Rt+1> 


Vl  ’  Htpt<  +  Wl 


A  nunber  of  authors  in  the  statistical  liters- 


turc  h.ivf*  pointed  out  that  a  almplo  nppl  trillion 
of  this  algorithm  to  ARMA  models  ran  be  made  to 
obtain  the  e(t)  and  o?.  *Thua  nlnce 
f  r 

Y(t+j)|t+l)  -  Y(t+j|t)  +  gjf. (t+1)  , 

j  -  l,  ....  n  -  1 
P 

I  a(k)Y(t-Hn-k|  t)  +  g  r(t+l), 
k-1  " 

«■  J  -  ■ 

J-l 

where  g  -  1  end  g.  -  8(J-1)  +  I  a(k)g.  .  ,J>2 
1  J  k-1  3  * 


and  m  -  max(p,  q+1),  we  can  write  the  equation 
of  atate 

!t+i  ’  H  !t  +  ?  e(t+1)  • 

where  gT  -  (gj,  g2 . gj ,  and 

H  -  f  0  1  0  ...  0  1 

0  0  1  ...  0 


0  0  0 
a(ra)  . 


...  a(l) 


where  a(J)  -  0  If  J  >  p. 

Further,  since  Y (t+1 | t+1)  -  Y(t+1),  we  can  write 
the  observational  equation 

“ $T  !t*i +  ?*♦»  • 

where  X(+1  -  Y(t+1),  ST  -  (1,  0,  ...,  0),  and 
is  an  observational  error  rand on  variable. 

Cholesky  Decomposition  Algorithm 

An  (nxn)  syimaetrlc  matrix  An  Is  positive  defi¬ 
nite  If  and  only  If  it  can  be  written 

A„  ’  la  „da  „  •  0.2) 

n  A  ,n  A ,n  A 

where  L.  Is  a  unit  lower  triangular  matrix  and 
A  |fl 

Is  a  diagonal  matrix  with  positive  diagonal 

elements.  This  Is  the  modified  Cholesky  decom¬ 
position  of  A  .  We  note  that  we  can  also  write 

n  T 

the  Cholesky  decomposition  A  -  M,  M,  where 
,  n  A ,n  A,n 

M.  "  L,  D,  but  we  will  consider  exclusively 
A,n  A,n  A,n 

the  modified  decomposition. 

The  decomposition  (3.2)  Is  unique  and  nested, 

— ’  **a, J ’  da,J*  LAJ’  daj  are  the  0*J)  princi¬ 
pal  minors  of  ,  D,  ,  l.  *  ,  and  D,^  respec- 
Atn  A#n  A,n  Atn  r 

tlvely  for  J  £  n.  Thus  we  can  denote  the  (],k)th 
element  of  Lv  and  Dv  in  the  decomposition  of 

I  I p c 

the  (KxK)  covariance  matrix  ^  ^  of  i  time  series 


Y(*)  m  l.y  .  .  nnd  D  respectively  for  f.  £  K. 

If  Y(*)  Is  a  purely  nondctermlnlstlc  covariance 
ntatlonnry  time  serlee  with  spectral  density 
function  t(*),  l.e. 

r* 

a2  -  2x  exp(-r-  /  log  f(w)dw)  >  0  , 

«  Z* 

then  e(l)  -  Y(l),  while 
t-1 

e(t)  -  Y(t)  ■  [  Ltf  .  e(t-k) ,  t  >  2 
k-1  T,t,c-It 

°t  “  °Y , t , t  * 

and  In  fact  (Newton  and  Pagano  (1981)) 
t+h-1 


Y(t+h | t) 


k5h  LY,t+h,t+h-kc(t+h_k) 


h-1 

,2  .  -  r  l2 


°t,h  ’  k^0  Y  ,  t+h , t+h-k °Y , t+h-k , t+h-k 
Further, 


11m  L  „  »  B  (J),  J  >  0 

K~  Y*K,K'J 


where  the  B„(-)  are  the  coefficients  In  the  MA(») 
representation  of  Y(-).  While  these  facts  ap¬ 
pear  to  be  known  In  the  literature  of  control 
theory  they  do  not  appear  to  exist  In  the  sta¬ 
tistical  literature,  at  least  In  the  general 
form  of  equatlona  (3.3)  through  (3.6). 

Now  major  simplifications  of  these  equations 
occur  when  Y(-)  Is  an  ARMA  process.  Thus  (Pa¬ 
gano  and  Parzen  (1973),  Pagano  (1976),  Newton 
(1980))  If  Y(.)  is  an  MA(q)  we  have  that  j  k 

-Oif  | J  — k |  >0  and  thus  L  .-OlfJ-k 

>  0.  Further,  Pagano  and  Parzen  (1973)  state 
In  sui  attempt  to  find  Y(t+h|t)  and  (j2  ^  that  if 

Y(.)  *  ARMA(p,  q,  a,  B>  o2),  and  one  forms  the 
P  *  * 

series  X(t)  -  t  a(J)Y(t-J),  t  >  p,  then 
J-0 

X(p+1) ,  ....  X(T)  1*  a  realization  from  an 
MA(q,  §.  o2)  process  and  one  can  combine  the 
MA(q)  prediction  algorithm  with  an  AR(p)  pre¬ 
diction  algorithm.  Ansley  (1979)  makes  this 
same  transformation  of  Y(*)  to  X(«)  to  get  the 
e(t)  and  a2  necessary  to  evaluate  L. 

The  methods  of  Pagano  and  Parzen  (1973)  and 
Ansley  (1979)  can  be  suraiarlzed  theoretically 
by  combining  the  equations  (3.3)  through  (3.6) 
above  with  the  following  theorem  due  to  Newton 
and  Pagano  (1981). 

Theorem  • 

Let  Y( •)  *  ARMA(p,  q,  ?,  fl,  o2)  and  Z(«) 

*  AR(p ,  a,  o2)  with  associated  covariance  matrix 

sequence.  ry>t  -  and 

°*.t4.f 


h,t  “  Lz,tLx,t  • 

°Y.t  “  °X,t  • 

where  Xt  -  (X(l) . X(t))T  -  L"1^  hns  jth 

element 


X(J)  -  Y(J) 

J-l 


.  J  ■  l 


E  a  (k)Y(j-k)  ,  j  -  2 . . 

■  k-0  J 
P 

E  a(k)Y ( j-k)  .  J  >  p 

k-0 

where  the  a^(*)  are  easily  obcalned  by  perform¬ 
ing  the  Levinson  (19/4)  recursion  for  J  -  p-1, 
....  1,  with  ap+100  -  n(k): 


Ojd)  - 


Vi(1)  -  V1(J+l)v1(J+1"1) 


1  <  j  <  p. 


1  ‘  “j+l<J+1> 


^X  l  )  ’  ^  a»l  (J-®) 

1 

*  z  a1_1<i-£)RY<Z-m),i,Jil 

£-max(l,l-p)1  1 


°t,h  nx,t+h-k,t+h-k 


* "  k-0  .  " 

t+h 

Kf-t+h-kLz*t+h,iLx*£*t+h-k|2  ‘ 

Thus  to  find  the  one  step  nhend  prediction 
errors  nnd  variances  one  need  only  compute  the 
q  nonzero  elements  of  the  successive  rows  of 

until  the  convergence  properties  (1.7)  and  (1.8) 
take  effect.  Further,  one  can  then  also  find 
the  Y(t+h|t)  f rom  these  same  quantities  nnd  the 
e(").  Note  however  that  to  find  o2  .  one  also 


needs  to  find  the  elements  of  L_ 


These  can 


be  found  simply  by  noting  thnt  the  upper  (pxp) 

principal  minor  I.  can  be  found  by  Inverting 
—  *P  _j 

the  lower  triangular  matrix  L_  ,  while 

Z,p 

LZ  J  k  "  UkiP  • 

where  y(0)  -  1,  y(I),  y(2),  ...  are  the  coef¬ 
ficients  in  the  MA  (“•)  representation  of  Z  ( *  )  . 
Further,  the  elements  In  rows  p+1 ,  ...  In  the 
first  (p-1)  columns  of  are  obtained  by 
P 


Z.p+j.k 


-  -  E  a(r)L 


Z,p+j-r,k 


,  1  £  k  <  p. 


,  .  q-li-Jl 

-  a2  E  8(k)8(k+|l-j|), 
k-0 

U-Jl  19  i.J  >  P 


10  If  1  £  J  <_  p,  i  >  p,  and  1  -  J  >  q 

or  If  l.J  >  p  and  |i-j|  >  q 


Z,p+j  ,k 


1  <  k  <  p  . 


Thus  the  predictors  and  prediction  variances 

Y(t+h|t)  and  a*  can  be  obtained  using  either 
t  ,n 

the  Kalman  filter  algorithm  or  the  Cholesky  de¬ 
composition  algorithm.  Also  the  convergence 
properties  described  in  the  Cholesky  algorithm 
can  be  Incorporated  Into  the  Kalman  filter  al¬ 
gorithm. 


If®  f'y  £  “  8(J),  J  •  1,  ...,  q  (3.7) 

*ia  DX,K,K  "  °2'  (3.8) 

K-h» 

Thus  e(l)  -  X(l),  while 

»in(q,j-l) 

e(t)  -  X(t)  -  E  l,  .  e(t-k) ,  t  >  1 

°t  ’  Dx,t,t  • 

and  In  fact 

p 

Y(t+b|t)  -  X(t+h|t)  -  E  o(J)Y(t+h-j | t)  , 


h  ■  l . q 


4.  DISCUSSION 

Thus  we  have  seen  that  algorithms  developed 
recently  for  finding  maximum  likelihood  esti¬ 
mators  of  ARMA  process  parameters  have  essenti¬ 
ally  consisted  of  applying  established  algor¬ 
ithms  for  finding  the  minimum  mean  square  error 
linear  one  step  ahead  predictors  and  prediction 
variances.  We  have  shown  how  the  Cholesky  al¬ 
gorithm  can  be  used  to  find  more  than  one  step 
ahead  predictors  nnd  variances  as  well.  We  note 
that  both  the  Kalman  filter  and  Cholesky  decom¬ 
position  algorithm  can  be  extended  easily  to 
the  multivariate  ARMA  case.  We  also  note  that 
we  have  not  attempted  to  survey  all  the  recent 
work  in  the  ARMA  estimation  area,  but  rather 
have  emphasized  those  algorithms  that  appear  to 
be  most  widely  used  in  the  literature. 

We  consider  next  the  relative  speed  and  stabil¬ 
ity  of  the  Kalman  filter  and  Cholesky  decom¬ 
position  algorithms.  Pearlman  (1980)  shows  that 


the  number  of  ope  rot  tons  needed  to  find  the  e(t) 
and  via  the  Kalman  filter  algorithm  Is  ap¬ 
proximately  T(2p  t  3m  +  3),  with  m  -  mnx(p,q+l), 
while  for  the  Cholesky  algorithm  It  Is 
T(p  +  \(q+’l) (q+4) ) .  Thus  If  q  >.  5  the  Kalman 
algorithm  Is  faster. 

Now  the  fact  that  the  Kalman  filter  algorithm 
performs  a  number  of  "matrix  squaring"  oper- 

T 

tlons  (for  example  II  1MI  In  (3.1)  has  enused 

many  Investigators  to  question  Its  stability. 
However  a  number  of  authors  have  suggested  meth¬ 
ods  for  avoiding  these  operations  (see  Paige 
(1976)  for  example).  The  Cholesky  decomposition 
Is  well  known  for  Its  numerical  stability  (see 
Wilkinson  (1967)  for  example).  However  more 
work  Is  needed  before  a  final  conclusion  can  be 
made  about  which  of  the  two  methods  Is  to  be 
preferred. 

Finally  we  point  out  that  there  are  a  variety 
of  numerical  methods  for  analyzing  linear  time 
series  models  In  the  work  of  control  theorists 
particularly  In  a  series  of  papers  of  Kallath 
(see  the  references  In  Aasnaes  and  Kallath 
(1973))  that  need  to  be  Incorporated  Into  the 
statistical  literature. 
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